Detecting occlusion of digital ink

ABSTRACT

An image processing apparatus is described comprising a processor configured to receive a video and digital ink annotated on the video. For at least a first frame of the video, the processor is configured to compute a model describing pixels of a bounding region of the ink. For a frame of the video, the processor is configured to compute a second region corresponding to the bounding region. The processor is configured to compute a comparison between the second region and the model and update the ink using the comparison.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation application of and claims priority toU.S. patent application Ser. No. 15/621,613, entitled “DETECTINGOCCLUSION OF DIGITAL INK,” filed on Jun. 13, 2017, the disclosure ofwhich is incorporated herein by reference in its entirety.

BACKGROUND

With current technology it is possible to ink, electronically (e.g. witha pen), on video. When the ink is intended to be applied to an object inthe video, other objects in the video may move in front of (i.e.occlude) the inked object in some frames.

However, in current known methods of rendering the ink, the occludingobjects are not taken into account and the ink is still rendered abovethe occluding objects. Thus, the ink is not rendered as a natural partof the scene.

In other known methods, a three-dimensional (3D) model of the wholescene is constructed from the frames of the video, in order to calculatethe depth of the ink compared to other objects in the scene, in order torender the ink at the correct depth. Such a 3D model is computationallycomplex to perform, and is therefore not a desirable method in deviceshaving a lower computational power (for example, mobile devices).

The embodiments described below are not limited to implementations whichsolve any or all of the disadvantages of known image processing systems.

SUMMARY

The following presents a simplified summary of the disclosure in orderto provide a basic understanding to the reader. This summary is notintended to identify key features or essential features of the claimedsubject matter nor is it intended to be used to limit the scope of theclaimed subject matter. Its sole purpose is to present a selection ofconcepts disclosed herein in a simplified form as a prelude to the moredetailed description that is presented later.

An image processing apparatus is described, comprising a processorconfigured to: receive a video; receive digital ink annotated on thevideo; for at least a first frame of the video, compute a modeldescribing pixels of a bounding region of the ink; for a frame of thevideo, compute a second region corresponding to the bounding region;compute a comparison between the second region and the model; and updatethe ink using the comparison.

Many of the attendant features will be more readily appreciated as thesame becomes better understood by reference to the following detaileddescription considered in connection with the accompanying drawings.

DESCRIPTION OF THE DRAWINGS

The present description will be better understood from the followingdetailed description read in light of the accompanying drawings,wherein:

FIG. 1 is a schematic diagram of an image processing system having anocclusion detecting component and an ink updating component;

FIG. 2 is a flow diagram of a method for detecting occlusion of digitalink in a digitally annotated video;

FIGS. 3A to 3C are schematic diagrams showing how digital ink isrendered in frames of a video by the image processing system of FIG. 1 ;

FIGS. 4A to 4C are schematic diagrams showing how digital ink isrendered in frames of a second video by the image processing system ofFIG. 1 ;

FIG. 5 is a schematic diagram showing a set of sub-regions making up abounding region of digital ink;

FIG. 6 is a flow diagram of an alternate method for detecting occlusionof digital ink in a digitally annotated video; and

FIG. 7 illustrates an exemplary computing-based device in whichembodiments of an image processing system are implemented.

Like reference numerals are used to designate like parts in theaccompanying drawings.

DETAILED DESCRIPTION

The detailed description provided below in connection with the appendeddrawings is intended as a description of the present examples and is notintended to represent the only forms in which the present example areconstructed or utilized. The description sets forth the functions of theexample and the sequence of operations for constructing and operatingthe example. However, the same or equivalent functions and sequences maybe accomplished by different examples.

Although the present examples are described and illustrated herein asbeing implemented in an image processing system, the system described isprovided as an example and not a limitation. As those skilled in the artwill appreciate, the present examples are suitable for application in avariety of different types of image processing systems.

FIG. 1 is a schematic diagram of an image processing system 102 deployedat a computing device connected to a communications network 100. Theimage processing system 102 has an occlusion detecting component 104which is able to estimate occlusion of digital ink in frames of a video,and an ink updating component 106 which is able to update digital inkbased on occlusion information computed by the occlusion detectingcomponent 104. The image processing system optionally has an objecttracking component 108. In some examples the image processing system 102is provided as a cloud service accessible to electronic devices such assmart phone 110, tablet computer 112, smart watch 114 or otherelectronic devices via communications network 100. In some cases theimage processing system 102 is deployed at an electronic device such assmart phone 110 or another type of electronic device. The imageprocessing system 102 is distributed between an electronic device 110,112, 114 and a computing entity connected to communications network 100in some examples.

In the example illustrated in FIG. 1 the smart phone has a video camera(not visible in FIG. 1 ) which has captured a video of a scenecomprising a wall in the background. A user has annotated a frame of thevideo by drawing electronic ink (digital ink) on the wall. The video hasbeen captured by a user holding the smart phone 110. In the video, a manwalks in front of the wall. The image processing system 102 is used todetect when the man walks in front of a part of the wall which has theelectronic ink applied to it, and thus which parts of the ink to hide(i.e. parts of the ink which is occluded by the man). For example, FIG.1 shows a tablet computer 112 playing the video and with a differentframe of the video visible than for the smart phone 110 of FIG. 1 . Theman does not occlude the ink and thus no parts of the ink are hidden bythe image processing system 102. FIG. 1 also shows a smart watch 114displaying another frame of the video in which the man is in front of apart of the wall on which the ink has been applied. In this case, theimage processing system 102 detects that the man is occluding a part ofthe wall which has ink on it, and hides occluded parts of the ink. Thus,the ink has the appearance of having the same depth as the wall in thevideo. The image processing system 102 estimates occlusion by computinga model describing the region of the wall on which the ink is applied,and compares pixels in a corresponding region in subsequent frames tothe model. The comparison allows the image processing system 102 tocompute an estimate of whether occlusion has occurred. This is because,if there is a difference between the second region and the model itsuggests that a foreground object has entered the scene depicted in thesecond region.

The image processing system 102 is computer implemented using any one ormore of: software, hardware, firmware. Alternatively, or in addition,the functionality described herein is performed, at least in part, byone or more hardware logic components. For example, and withoutlimitation, illustrative types of hardware logic components that areoptionally used include Field-programmable Gate Arrays (FPGAs),Application-specific Integrated Circuits (ASICs), Application-specificStandard Products (ASSPs), System-on-a-chip systems (SOCs), ComplexProgrammable Logic Devices (CPLDs), Graphics Processing Units (GPUs).

FIG. 2 is a flow diagram of a method for detecting occlusion of digitalink in a digitally annotated video. First, a video is received 201 bye.g. an image processing system 102, and digital ink (e.g. a group ofdigital ink strokes) annotated on the video is also received 203. Thedigital ink may have been previously applied to the video or a frame ofthe video by a user, for example.

A model is then computed 205, describing pixels of at least a firstframe which are in a first bounding region containing the ink. In someexamples the model comprises a single statistical model describing allof the pixels of at least the first frame in the bounding region. Inother examples, the bounding region is divided up into a set of gridcells or sub-regions which make up the bounding region, and the modelcomprises a set of sub-models, each describing a respective grid cell ofthe bounding region. The grid cells may extend over m×n pixels (e.g 1×3,3×3, 32×32 etc.). In particular, in some examples, each grid cell orsub-region may cover one pixel (i.e. a 1×1 grid cell). The boundingregion is, for example, a rectangle containing the ink, but any polygonwith at least three vertices or any other closed shape containing theink defines the bounding region in other examples. Variousmodels/sub-models may be used to describe the pixels of the firstbounding region, and some are described in more detail herein. A secondregion is then computed 207, corresponding to the bounding region, in asubsequent frame. This may be performed before or after (or at the sametime as) computing the model of the first bounding region. Where thesecond region is computed at the same time as or before computing themodel, in some examples information from the pixels of the second regionis used to generate the model. A second region may be computed, forinstance, where the digital ink has been applied to a background and thecamera is moving, it is necessary to track the movement of thebackground so that the ink is correctly “locked” to the background. Anyobject tracking algorithm may be used to track the background. Forinstance, in one example a template (i.e. a plurality of pixels) of apart of the bounding region is generated, and the subsequent frame issearched to find a matching region (i.e. of corresponding pixels) whichmost closely matches the template. This is done, for example, using anormalized cross correlation function or other comparison metric andresults in a transformation consisting of a translation and a scale.Alternatively, in another example a set of keypoints (such as orientedfast and rotated brief (ORB) keypoints, or other local image features)are detected in the bounding region, and corresponding keypoints arefound in the subsequent frame to compute a homography transformation.This is done, for example, using keypoint descriptor matching followedby random sample consensus (RANSAC) optimization or any otheroptimization algorithm. The ink and the model are mapped to thesubsequent frame using the transformation based on the relationshipbetween the template and the matching region (i.e. so that the ink is“locked” to the background).

Next, a comparison is computed 209 between the second region and themodel (i.e. between the pixels and the model or the relevant sub-modelfor that pixel), to decide which regions of the second region areoccluded regions (i.e. which parts of the ink are to be hidden). In someexamples, the comparison comprises a similarity value between pixels ofthe second region and the model. An example of the comparison isdescribed in more detail below. The model may optionally be updated withthose parts of the second region which are not occluded regions. Thisgives the benefit that the model is updated over time as the processrepeats, and takes into account gradual changes in the video such asillumination changes.

Finally, the ink is updated 211 using the results of the comparison, sothat regions of ink which are occluded by a foreground object arehidden, in order to give the ink the same “depth” as the backgroundobject on which it was applied. The ink is overlayed on top of the videoas originally recorded, and so without the above method being applied,the ink appears on top of both the foreground and background objects.Thus, when the ink is updated using the results of the comparison, theink is hidden in areas where foreground objects are detected, in orderto give the ink depth compared to the foreground objects in the video.In some examples, the results of the comparison are improved usingsegmentation techniques such as graph cut based methods or othersegmentation methods and refined by local filtering. Any local filteringcan be used and in an example, cross bilateral filtering is used wherebythe mask (regions of hidden ink) is filtered using the color video frameas a guide.

In some examples, the method outlined above is extended to applyseparately to any number of ink strokes in the video.

Various models/sub-models are used to describe the pixels of a framewhich are in a bounding region containing the ink. For instance, in someexamples the technology uses a statistical sub-model for each pixel inthe bounding region which is updated for each consecutive frame in thevideo. The set of sub-models, combined, make up a model describing thebounding region. For such a statistical sub-model the region having thedigital ink should not be covered by any foreground objects in at leastone frame of the video. If it is not the first frame (i.e. when the inkwas applied), the method for detecting occlusion would still beeffective, so long as additional information is provided to determinewhich frames do not have the occlusion. In many cases the majority ofthe frames do not have occlusion and in some examples it isautomatically determined which frames to use when creating thestatistical sub-models.

In some examples, a plurality of statistical sub-models are computed,each for an individual pixel of the bounding region. An example of sucha statistical sub-model is defined as a normal distribution with theprobability density function (PDF):

${f\left( {x{❘{\mu,\sigma^{2}}}} \right)} = {\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{\frac{{({x - \mu})}^{2}}{2\sigma^{2}}}}$

Where μ is the mean or expectation of the distribution (and also itsmedian and mode), σ is the standard deviation and σ² is the variance,and x is the intensity of the pixel being considered. In some examples,the intensity is the greyscale intensity whilst in others it is thecolor intensity of one color channel of the pixel. In yet furtherexamples, the intensity is generalized to any number of channels bymultiplying each PDF for whichever channels to get the total PDF. ThePDF tells how likely it is that a pixel value belongs to the previousobserved pixel values for given pixel. In one example the channelscorrespond to the color channels of the video frame. In other examplesthe channels are features, i.e. gradients, gradient magnitudes, gradientangles, edge responses; computed based on a pixel neighborhood.

The sub-model is initialized by setting the mean as an observed pixelvalue (such as from an early frame of the video). The variance isinitially set to a default value which is determined empirically orselected according to the types of image data to be processed. In someexamples, to have a more robust estimation of the model variance, thevariance is initialized as the average of the squared difference of thepixels in the region with approximately the same gradient magnitude asthe given pixel to define the model for.

To compute the comparison between the second region and the model usingthe set of sub-models described above, the likelihood that a pixel ofthe subsequent frame is a foreground pixel, and thus that digital ink onthe pixel should be hidden due to occlusion (or in other words, thesimilarity value), is defined as:

p(x|t)=1−f(x|μσ ²)

Which is expressed in words as, the probability that a given pixellocation with intensity x in time frame t is a member of the foregroundis equal to one minus a probability density function describingpreviously observed intensity values of pixels at the same location inother frames of the video.The comparison thus generates an occlusion probability map comprisingvalues p(x|t) for each pixel, calculated from the similarity values ofthe individual pixels of the second region. In some examples, theprobability map is applied to the ink to update it, so that ink ispartially hidden at a magnitude proportional to the probability value.In other examples, a system is configured to hide any ink on pixelswhose probability of being a foreground pixel is above a predeterminedthreshold value (for example about 0.7, corresponding to 1.5 standarddeviations, however, note that other values can be used depending on thetypes of images and the particular capture devices and processors beingused). A pixel is considered to be in the background if the likelihoodis below the predetermined value. The predetermined thresholddifferentiates the pixels between foreground and background pixels, andthus generates an occlusion map indicating pixels of the second regionwhich have a similarity value below a predetermined threshold value.Optionally, to reduce exposure changes of video frames such as changedexposure time and white balance, adjustments of model parameters aremade based on histogram matching between the first and the second videoframe.

Optionally, to reduce noise in the detection of foreground objects (i.e.occluding objects) a spatial filtering is applied on neighboring pixels(for instance 5×5 neighboring pixels or other numbers of neighboringpixels). In some examples, the spatial filtering of the probability mapis a cross bilateral filtering summarized as:

${P_{filtered}\left( {x{❘t}} \right)} = {\frac{1}{W_{p}}{\sum\limits_{x_{i} \in \Omega}{{p\left( {x_{i}{❘t}} \right)}{f\left( {{{I\left( {x_{i}{❘t}} \right)} - {I\left( {x{❘t}} \right)}}} \right)}{g\left( {{x_{i} - {x{❘t}}}} \right)}}}}$

Where the summation is over a window centered in x (i.e. a 5×5 window),W_(p) is a normalization term, f is a range kernel (i.e. a Gaussianfunction) acting on the video frame I, g is a spatial kernel (i.e. aGaussian function). The above mathematical expression is expressed inwords as, the probability of a pixel at a given location in a frame attime t of the video, after cross bilateral filtering has been applied isequal to a weighted average of the probabilities in a neighborhood wherethe weights also depend on the color pixel differences in the videoframe. In some examples, p_(filtered) is a non-weighted average of theprobabilities in a neighborhood Again, a pixel is considered to be inthe foreground if the filtered likelihood is above a predeterminedvalue. For instance, in some models the value may be 0.7 (within 1.5standard deviations) but other values can be used depending on theparticular types of capture devices, the processors used and otherfactors. A pixel is considered to be in the background if the filteredlikelihood is below the predetermined value. In other words, foregroundpixels are hidden based on the probability map. In some examples,neighboring foreground pixels are subsequently grouped using connectedcomponent labeling to further filter the map and avoid false detectionsof small areas of foreground pixels. Optionally, the ink is hidden if agroup of foreground pixels is larger than about 5% of the boundingregion.

The above-described model may be expanded so that, instead of creating astatistical sub-model for each pixel, the region is divided into a gridwith m×n pixels in each cell, and a model similar to the above isgenerated for each cell (i.e. by using the same statistical method—butwith the additional step of averaging over the m×n pixels in eachframe—to find the mean and variance). In other words, the modelcomprises a plurality of sub-models describing pixels in a plurality ofsub-regions (each grid cell) making up the bounding region. In thesubsequent frames, each pixel is compared to the sub-model of whichevercell it belongs to. This is done to improve performance and robustnessagainst distortions near sharp edges in the video. In the examples whereeach sub-model describes a grid of pixels (as opposed to a single pixel,or a 1×1 grid of pixels), the mean and variance need not be initializedas a default value as there are sufficient pixels in each grid cell tocompute a mean and variance.

For any sub-model describing any sized grid cell (e.g. a single pixel oran number of pixels), the set of sub-models may optionally be furtherupdated using pixels from subsequent frames which have not been found tobe foreground pixels (as the foreground pixels are not representative ofthe background). More specifically, each sub-model may be furtherupdated using pixels in the relevant grid cell from subsequent frames,which have been found to be background pixels.

For each consecutive frame, and for pixels that have been identified asbackground pixels, the relevant sub-model is updated according to thelearning algorithm (i.e. an algorithm that evolves the model to adapt tothe subsequent frame):

${\mu(t)} = {{\left( {1 - \alpha} \right)*{\mu\left( {t - 1} \right)}} + {\alpha*\frac{1}{n}{\sum\limits_{i = 1}^{n}{{x_{i}(t)}{and}}}}}$${\sigma^{2}(t)} = {{\left( {1 - \alpha} \right)*{\sigma^{2}\left( {t - 1} \right)}} + {\alpha*\frac{1}{n}{\sum\limits_{i = 1}^{n}\left( {{x_{i}(t)} - {\mu(t)}} \right)^{2}}}}$

Where t is the time and a is the learning rate, depending on how rapidlythe sub-model should update. The above learning algorithm updates themean of the probability density function of the sub-model (for aspecified pixel location), at the video frame for time t by setting itequal to one minus the learning rate, times the mean of the probabilitydensity function of the sub-model at the video frame for time t minus 1,plus the learning rate, times the average value of the intensity of thecorresponding pixels in the frame at time t, that have been identifiedas background pixels in the video frame for time t. (In the case of asingle pixel or 1×1 grid cell, the second half of the equation would beequal to one minus the learning rate, times the mean of the probabilitydensity function of the sub-model at the video frame for time t minus 1,plus the learning rate, times the value of the intensity of thecorresponding pixel in the frame at time t, that has been identified asa background pixel). The above learning algorithm updates the varianceof the probability density function of the model at the video frame fortime t (for a specified pixel location or grid cell) to be equal to oneminus the learning rate times the variance of the probabilitydistribution for the specified pixel location or grid cell at videoframe t−1, plus the learning rate times the average squared differencebetween the intensity of the pixel or grid cell and the mean intensity,of the pixels in the new frame that have been identified as backgroundpixels.

In the case where the video has been captured with a freely movingcamera, the above model (comprised of the sub-models) may still be used.As outlined above, a homography between each frame for the boundingregion is found, and the bounding region is moved (along with thestatistical models) to compensate for the camera motion. If large cameramotion is detected then in some examples a higher learning rate, a, isused to compensate for additional distortions in the statistical modeldue to the motion.

In some examples, an alternative method of modeling the pixels of theframes is used. In one example, data is collected from substantially thewhole video sequence. A statistical model is constructed usingsubstantially all of this data. In the next step occlusion is detectedby looking back at each frame and making a decision about which pixelsof which frame are occluded. This may be performed using similar methodsto those described herein (for example generating a probabilitydistribution as described herein, and marking pixels in each frame asoccluded or not occluded using the probability distribution of thedata). This method negates the need to update the model between eachframe.

In other examples, the pixels are modelled using a Gaussian mixturemodel of at least one color channel of grids of pixels in the boundingregion. In the model, for each grid cell a sub-model using one normaldistribution is estimated. The model is estimated by using a best-fitalgorithm. If the fit is worse than a threshold value, then a secondnormal distribution is added to the model and the distribution isupdated with the best fit. This may be beneficial for scenes with smallflickering details. The Gaussian mixture model is particularly usefulfor examples where larger (e.g. 3×3 and above) grid cells are used, orfor videos having a number of frames that are known to have no occlusionin them, in order to be able to model a sufficient number of pixels foreach sub-model. In this model, it is determined if a pixel is in theforeground or the background by testing pixels in the subsequent frame.Each pixel is tested for the probability that it belongs to each of thenormal distributions of the Gaussian mixture model. If the probabilityis below a threshold for each of the normal distributions (for example,indicating 1.5 standard deviations, although other values can be useddepending on the types of images and the particular capture devices andprocessors being used), then it is decided that the pixel does notbelong to any background class and is thus a foreground pixel. Theaccuracy of the Gaussian mixture model may be improved by also weightingthe different distributions of the model based on the frequency that thebackground pixels belong to each distribution (estimated by the numberof samples).

FIGS. 3A to 3C are schematic diagrams showing how digital ink isrendered in frames of a video 300 by the image processing system of FIG.1 using the method as described in FIG. 2 . FIG. 3A shows a first frame301A, wherein digital ink 304 has been applied to a wall 302 in thebackground. The image processing system 102 computes a bounding region305 containing the digital ink 304. In a subsequent frame 301 b, a man306 walks into view and into the bounding region 305, but does notocclude any part of the wall 302 which has ink 304 applied to it. Theimage processing system 102 does not hide any of the ink. In a thirdframe 301 c, the man 306 walks in front of a part of the wall 302 whichhas ink 304 applied to it. The image processing system 102 detects thepixels of the wall 302 which have ink 304 on it and which have beenoccluded by the man 306. As a result the ink 304 is updated to hide theparts of the ink 304 that are on occluded parts of the wall 302.

FIGS. 4A to 4C are schematic diagrams showing how digital ink isrendered in frames of a second video 400 by the image processing systemof FIG. 1 , wherein cells of multiple pixels are modelled for eachsub-model instead of a single pixel. Similar to the video 300 shown inFIGS. 3A to 3C, FIG. 4A shows a first frame 401A, wherein digital ink404 has been applied to a background 402. The image processing system102 computes a bounding region 405 containing the digital ink 404. In asubsequent frame 401 b, an object 406 moves into view, and into thebounding region 405, but does not occlude any part of the background 402which has ink 404 applied to it. The image processing system 102 doesnot hide any of the ink. In a third frame 401 c, the object 406 moves infront of a part of the background 402 which has ink 404 applied to it.The image processing system 102 detects the pixels of the background 402which have ink 404 on it and which have been occluded by the object 406.As a result the ink 404 is updated to hide the parts of the ink 404 thatare on occluded parts of the background 402.

FIG. 5 shows how a bounding region 405 of the ink 404 is divided up intosub-regions 500 n, each a grid cell containing a plurality of pixels. Asub-model is computed for each sub-region as outlined above. The valuesof the sub-models may be interpolated at the boundaries of thesub-regions 500 n to mitigate against discontinuities or abrupt changes.

FIG. 6 is a flow diagram of an alternate method for detecting occlusionof digital ink in a digitally annotated video. As in the case of themethod shown in FIG. 2 , first, a video is received 601 by e.g. an imageprocessing system 102, and digital ink (e.g. a group of digital inkstrokes) annotated on the video is also received 603. The digital inkmay have been previously applied to the video or a frame of the video bya user, for example.

A model is computed 605, describing pixels of at least a first framewhich are in a bounding region containing the ink. In some examples themodel comprises a single statistical model describing substantially allof the pixels of at least the first frame in the bounding region. Inother examples, the bounding region is divided up into a set of gridcells which make up the bounding region, and the model comprises a setof sub-models, each describing a respective grid cell of the boundingregion. The grid cells may extend over m×n pixels (e.g 1×3, 3×3, 32×32etc.). In particular, in some examples, each grid cell may cover onepixel (i.e. a 1×1 grid cell). The bounding region is, for example, arectangle containing the ink, but in other examples any polygon with atleast three vertices or other closed shape containing the ink definesthe bounding region. Various models/sub-models may be used to describethe pixels of the first frame, and some are described herein withrespect to FIG. 2 and are also applicable to the method of FIG. 6 . Asecond region is then computed 607, corresponding to the boundingregion, in a subsequent frame. This may be performed before or after (orat the same time as) computing the model (e.g. comprising a set ofsub-models) of the first bounding region. Where the second region iscomputed at the same time as or before computing the model, in someexamples the model also uses information from the pixels of the secondregion to generate the model. A second region may be computed, forinstance, where the digital ink has been applied to a background and thecamera is moving, it is necessary to track the movement of thebackground so that the ink is correctly “locked” to the background. Anyobject tracking algorithm may be used to track the background, asdescribed previously with respect to FIG. 2 .

Next, it is decided 609 whether the occlusion detection mechanism shouldbe aborted. Certain criteria regarding the video data may be checked todetermine whether it should be aborted or not. The criteria may beindicative of the occlusion mechanism being effective (in other words,if the criteria is not fulfilled then it may be indicative of the methodnot being successfully completed, or having poorer results). In someexamples, this may be when the algorithm used for generating the secondbounding region fails. In other examples, it may be that the matchingalgorithm used to track the movement of the background gives a matchingscore below a selected threshold value. In other examples, it may bethat global properties of the subsequent frame differs (e.g. if a meanvalue of the whole frame changes more than a threshold value), which isindicative of a situation where the occlusion detection mechanism willnot be accurate. In further examples, it may be that the variance of themodel is greater than a threshold value, or the variances of a thresholdnumber (for example 50%) of grid cells are greater than a thresholdvalue, which is indicative of an inaccurate model of the background. Insome examples, it is decided that whether the occlusion detectionmechanism should be aborted after generating 614 a comparison betweenthe second bounding region and the model. In those examples, it may bethat the proportion of pixels determined as foreground pixels exceeds athreshold percentage (for example 90%), which may make the occlusiondetection mechanism inaccurate. Some examples use a combination of theabove factors in determining whether the method should be aborted.

If it is determined that the method should be aborted, then the user isinformed 613 and the method terminates without the ink being updated.

If it is determined that the method should not be aborted, a comparisonis computed 614 between the second region and the model (if thecomparison has not already been generated), to decide which regions ofthe second region are occluded regions (i.e. which parts of the ink areto be hidden). The model may optionally be updated to include thoseparts of the second region which are not occluded regions.

Finally, the ink is updated 616 using the comparison, so that regions ofink which are occluded by a foreground object are hidden, in order togive the ink the same “depth” as the background object on which it wasapplied.

FIG. 7 illustrates various components of an exemplary computing-baseddevice 700 which are implemented as any form of a computing and/orelectronic device, and in which embodiments of image processingapparatus with an occlusion detecting and ink updating facility areimplemented in some examples.

Computing-based device 700 comprises one or more processors 724 whichare microprocessors, controllers or any other suitable type ofprocessors for processing computer executable instructions to controlthe operation of the device in order to carry out image processing withocclusion detection. In some examples, for example where a system on achip architecture is used, the processors 724 include one or more fixedfunction blocks (also referred to as accelerators) which implement apart of the method of FIG. 2 or FIG. 6 in hardware (rather than softwareor firmware). An occlusion detecting component 716 is able to detectocclusion of digital ink in a video as described herein. An ink updatingcomponent 717 is able to update the digital ink based on informationfrom the occlusion detecting component 716 as described herein. Platformsoftware comprising an operating system 712 or any other suitableplatform software is provided at the computing-based device to enableapplication software 714 to be executed on the device.

The computer executable instructions are provided using anycomputer-readable media that is accessible by computing based device700. Computer-readable media includes, for example, computer storagemedia such as memory 710 and communications media. Computer storagemedia, such as memory 710, includes volatile and non-volatile, removableand non-removable media implemented in any method or technology forstorage of information such as computer readable instructions, datastructures, program modules or the like. Computer storage mediaincludes, but is not limited to, random access memory (RAM), read onlymemory (ROM), erasable programmable read only memory (EPROM), electronicerasable programmable read only memory (EEPROM), flash memory or othermemory technology, compact disc read only memory (CD-ROM), digitalversatile disks (DVD) or other optical storage, magnetic cassettes,magnetic tape, magnetic disk storage or other magnetic storage devices,or any other non-transmission medium that is used to store informationfor access by a computing device. In contrast, communication mediaembody computer readable instructions, data structures, program modules,or the like in a modulated data signal, such as a carrier wave, or othertransport mechanism. As defined herein, computer storage media does notinclude communication media. Therefore, a computer storage medium shouldnot be interpreted to be a propagating signal per se. Although thecomputer storage media (memory 710) is shown within the computing-baseddevice 700 it will be appreciated that the storage is, in some examples,distributed or located remotely and accessed via a network or othercommunication link (e.g. using communication interface 722).

The computing-based device 700 also comprises an input interface 706which receives inputs from a capture device 702 such as a video camera,depth camera, color camera, web camera or other capture device 702. Theinput interface 706 also receives input from one or more user inputdevices 726. The computing-based device 700 comprises a an outputinterface 708 arranged to output display information to a display device704 which may be separate from or integral to the computing-based device700. A non-exhaustive list of examples of user input device 726 is: astylus, a mouse, keyboard, camera, microphone or other sensor. In someexamples the user input device 726 detects voice input, user gestures orother user actions and provides a natural user interface (NUI). Thisuser input may be used to change values of parameters, view responsescomputed using similarity metrics, specify templates, view images, drawelectronic ink on an image, specify images to be joined and for otherpurposes. In an embodiment the display device 704 also acts as the userinput device 726 if it is a touch sensitive display device. The outputinterface 708 outputs data to devices other than the display device insome examples, e.g. a locally connected printing device (not shown inFIG. 7 ).

Any of the input interface 706, output interface 708, display device 704and the user input device 826 may comprise NUI technology which enablesa user to interact with the computing-based device in a natural manner,free from artificial constraints imposed by input devices such as mice,keyboards, remote controls and the like. Examples of NUI technology thatare provided in some examples include but are not limited to thoserelying on voice and/or speech recognition, touch and/or stylusrecognition (touch sensitive displays), gesture recognition both onscreen and adjacent to the screen, air gestures, head and eye tracking,voice and speech, vision, touch, gestures, and machine intelligence.Other examples of NUI technology that are used in some examples includeintention and goal understanding systems, motion gesture detectionsystems using depth cameras (such as stereoscopic camera systems,infrared camera systems, red green blue (rgb) camera systems andcombinations of these), motion gesture detection usingaccelerometers/gyroscopes, facial recognition, three dimensional (3D)displays, head, eye and gaze tracking, immersive augmented reality andvirtual reality systems and technologies for sensing brain activityusing electric field sensing electrodes (electro encephalogram (EEG) andrelated methods).

Alternatively or in addition to the other examples described herein,examples include any combination of the following:

An image processing apparatus configured to detect occlusion of digitalink in a digitally annotated video, comprising a processor configuredto: receive a video; receive digital ink annotated on the video; for atleast a first frame of the video, compute a model describing pixels of abounding region of the digital ink; for a frame of the video, compute asecond region corresponding to the bounding region; compute a comparisonbetween the second region and the model; and update the ink using thecomparison. This enables the ink to be updated based on the comparisonso that occlusion of the ink is calculated with reduced computationalcomplexity.

The image processing apparatus described above, wherein the comparisoncomprises a computed similarity value between pixels of the secondregion and the model.

The image processing apparatus described above, wherein the processor isfurther configured to update the model to describe pixels of the secondregion which have a similarity value above a predetermined thresholdvalue if data from the second region has not yet been included in themodel. This allows the model, to evolve as the bounding region evolves(for example if lighting conditions for the object on which the digitalink is applied change in subsequent frames).

The image processing apparatus described above, wherein the model isupdated according to a learning algorithm having a learning rate. Thelearning rate enables the model to evolve where the properties of thebackground may also be changing.

The image processing apparatus described above, wherein the processor isconfigured to change the learning rate. A higher learning ratecompensates for additional distortions in the statistical model. Forexample, the processor may be configured to raise the learning rate fora number of frames if the computed second bounding region has a largeestimated translation, since the translation may be less exact and thusthe model may need to be adjusted faster. In another example, if globalchanges in exposure are detected in the video, it is first compensatedin the model and the learning rate is raised for a number of frames tocompensate for small errors in the compensation.

The image processing apparatus described above, wherein the comparisoncomprises an occlusion map indicating pixels of the second region whichhave a similarity value below a predetermined threshold value. Thisocclusion map is simple to compute based on the comparison.

The image processing apparatus described above, wherein the comparisoncomprises an occlusion probability map calculated from the similarityvalues of the individual pixels of the second region. The probabilitymap enables a more accurate estimate of occlusion to be calculated.

The image processing apparatus described above, wherein the probabilitymap is filtered using a cross bi-lateral filter to generate thecomparison. The cross bi-lateral filter lowers the frequency anomaliesin the occlusion probability map (for example a pixel which is marked as“not occluded” on the probability map, but which is surrounded byoccluded pixels).

The image processing apparatus described above, wherein the processor isconfigured to update the ink by applying the probability map to the ink.The updated ink therefore has occluded regions, non-occluded regions andpartially occluded regions, which may smooth out the ink at the boundaryof an occluding object.

The image processing apparatus described above, wherein the processor isconfigured to compute the comparison by: generating an occlusion mapindicating occluded pixels of the second region which have a similarityvalue below a predetermined threshold value; and for each pixel notindicated as occluded, marking the pixel as occluded if the number ofoccluded pixels in a selected neighborhood of the pixel is above apredetermined threshold value. This configuration lowers the frequencyof anomalies in the occlusion probability map.

The image processing apparatus described above, wherein the processor isconfigured to update the ink by segmenting the comparison between thesecond region and the first model using a graph cut algorithm.

The image processing apparatus described above, wherein the modelcomprises a statistical model of at least one channel of the pixels inthe bounding region, and the processor is configured to compute thecomparison by comparing the intensity of the at least one channel of thepixels in the second region to the model. The statistical model providesan apparatus for efficiently and/or accurately differentiating betweenan object on which the digital ink is applied, and an occluding object.

The image processing apparatus described above, wherein: the modelcomprises a set of sub-models describing pixels of the at least firstframe in a set of respective first sub-regions making up the boundingregion; the second region comprises a set of second sub-regionscorresponding to the set of first sub-regions; and the processor isconfigured to compute a comparison between each pixel of each secondsub-region and the corresponding sub-model.

The image processing apparatus described above, wherein the processor isconfigured to interpolate the comparison at the boundaries betweenneighboring sub-regions. This feature enables the apparatus to smoothdifferences in the comparisons at the boundaries the ink is furtherinhibited from following the lines of the boundaries.

The image processing apparatus described above, wherein each sub-modeldescribes pixels of a cell of a grid. This reduces the effect of anyboundary problems of the sub-models.

The image processing apparatus described above, wherein the processor isfurther configured to prior to updating the ink or prior to generatingthe comparison: check a set of criteria for the video, the criteriabeing indicative of the occlusion mechanism being effective; and if thecriteria is not fulfilled, then abort any remaining steps for detectingocclusion. This enables the apparatus to disable the occlusionestimation mechanism in cases where the method may not be successful oryield accurate results. For example, the criteria not being fulfilledmay indicate that an object on which the digital ink is applied isdynamically changing rapidly enough between frames such that theaccuracy of the comparison may be decreased.

The image processing apparatus described above, wherein the processor isconfigured to compute a second region corresponding to the boundingregion by: selecting a plurality of template pixels of the boundingregion; matching the template pixels of the bounding region tocorresponding matching template pixels in the subsequent frame:generating a homography transform matrix using the matching; andapplying the homography transform matrix to the bounding region togenerate the second region. This allows the apparatus to efficientlytrack an object which the ink is placed on in order to allow the ink tomove with the object in the subsequent frame.

The image processing apparatus described above, wherein the processor isconfigured to match the template pixels by searching the subsequentframe for a similar plurality of pixels, using template matching.

A computer-implemented method for detecting occlusion of digital ink ina digitally annotated video, comprising the steps of: receiving a video;receiving digital ink annotated on the video; for at least a first frameof the video, computing a model describing pixels of a bounding regionof the ink; for a frame of the video, computing a second regioncorresponding to the bounding region; computing a comparison between thesecond region and the model; and updating the ink using the comparison.The method allows the ink to be updated based on the comparison so thatocclusion of the ink is calculated with reduced computational complexity

One or more device-readable media with device-executable instructionsthat, when executed by a computing system, direct the computing systemto perform operations comprising: receiving a video; receiving digitalink annotated on the video; for at least a first frame of the video,computing a model describing pixels of a bounding region of the ink; fora frame of the video, computing a second region corresponding to thebounding region; computing a comparison between the second region andthe model; and updating the ink using the comparison. The ink is updatedbased on the comparison so that occlusion of the ink is calculated withreduced computational complexity.

An image processing apparatus comprising: means for receiving a video;means for receiving digital ink annotated on the video; for at least afirst frame of the video, means for computing a model describing pixelsof a bounding region of the ink; for a subsequent frame of the video,means for computing a second region corresponding to the boundingregion; means for computing a comparison between the second region andthe model; and means for updating the ink using the comparison. Forexample, the means for receiving a digital ink is the memory 710 orprocessor 724 or a combination of the memory 710 and processor 724. Forexample, the means for computing the model, computing a second regionand computing the comparison is the occlusion detecting component 104when configured to carry out the operation of all or part of FIG. 2 orFIG. 6 . For example, the means for updating the ink is the ink updatingcomponent 106 when configured to carry out the operation of all or partof FIG. 2 or FIG. 6 .

The examples illustrated and described herein as well as examples notspecifically described herein but within the scope of aspects of thedisclosure constitute exemplary means for detecting occlusion of digitalink of an annotated video. For example, the elements illustrated in FIG.1 , such as when encoded to perform the operations illustrated in FIG. 2or FIG. 6 , constitute exemplary means for detecting occlusion ofdigital ink of an annotated video, and exemplary means for updating thedigital ink using information derived from the detected occlusion.

The term ‘computer’ or ‘computing-based device’ is used herein to referto any device with processing capability such that it executesinstructions. Those skilled in the art will realize that such processingcapabilities are incorporated into many different devices and thereforethe terms ‘computer’ and ‘computing-based device’ each include personalcomputers (PCs), servers, mobile telephones (including smart phones),tablet computers, set-top boxes, media players, games consoles, personaldigital assistants, wearable computers, and many other devices.

The methods described herein are performed, in some examples, bysoftware in machine readable form on a tangible storage medium e.g. inthe form of a computer program comprising computer program code meansadapted to perform all the operations of one or more of the methodsdescribed herein when the program is run on a computer and where thecomputer program may be embodied on a computer readable medium. Thesoftware is suitable for execution on a parallel processor or a serialprocessor such that the method operations may be carried out in anysuitable order, or simultaneously.

This acknowledges that software is a valuable, separately tradablecommodity. It is intended to encompass software, which runs on orcontrols “dumb” or standard hardware, to carry out the desiredfunctions. It is also intended to encompass software which “describes”or defines the configuration of hardware, such as HDL (hardwaredescription language) software, as is used for designing silicon chips,or for configuring universal programmable chips, to carry out desiredfunctions.

Those skilled in the art will realize that storage devices utilized tostore program instructions are optionally distributed across a network.For example, a remote computer is able to store an example of theprocess described as software. A local or terminal computer is able toaccess the remote computer and download a part or all of the software torun the program. Alternatively, the local computer may download piecesof the software as needed, or execute some software instructions at thelocal terminal and some at the remote computer (or computer network).Those skilled in the art will also realize that by utilizingconventional techniques known to those skilled in the art that all, or aportion of the software instructions may be carried out by a dedicatedcircuit, such as a digital signal processor (DSP), programmable logicarray, or the like.

Any range or device value given herein may be extended or alteredwithout losing the effect sought, as will be apparent to the skilledperson.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts described above are disclosed asexample forms of implementing the claims.

It will be understood that the benefits and advantages described abovemay relate to one embodiment or may relate to several embodiments. Theembodiments are not limited to those that solve any or all of the statedproblems or those that have any or all of the stated benefits andadvantages. It will further be understood that reference to ‘an’ itemrefers to one or more of those items.

The operations of the methods described herein may be carried out in anysuitable order, or simultaneously where appropriate. Additionally,individual blocks may be deleted from any of the methods withoutdeparting from the scope of the subject matter described herein. Aspectsof any of the examples described above may be combined with aspects ofany of the other examples described to form further examples withoutlosing the effect sought.

The term ‘comprising’ is used herein to mean including the method blocksor elements identified, but that such blocks or elements do not comprisean exclusive list and a method or apparatus may contain additionalblocks or elements.

It will be understood that the above description is given by way ofexample only and that various modifications may be made by those skilledin the art. The above specification, examples and data provide acomplete description of the structure and use of exemplary embodiments.Although various embodiments have been described above with a certaindegree of particularity, or with reference to one or more individualembodiments, those skilled in the art could make numerous alterations tothe disclosed embodiments without departing from the scope of thisspecification.

1. An image processing apparatus configured to detect occlusion ofdigital ink in a digitally annotated video, comprising a processorconfigured to: receive a video; receive digital ink annotated on thevideo; for at least a first frame of the video, compute a modeldescribing pixels in a bounding region of the digital ink; for a frameof the video, compute a second region corresponding to the boundingregion; compute a comparison between the second region and the model;and update the ink using the comparison.